Method for sorting data in a computer at high speed by using data word values for address locations

ABSTRACT

The invention comprises a method of completely sorting an unsorted data table in a single pass through the data. The method uses a relatively large amount of computer memory, but sorts the data at great speed. Specifically, each unsorted data word is scaled to a size equal to or less than the number of address locations in a sort table. The scaled value created for each unsorted data word is converted to an address increment which is added to the initial address of the sort table. The unsorted data word is then stored into the sort table at the above calculated address that is related to the value of the data word. In this way, most words are placed in a sorted arrangement without comparison or iteration. The invention also comprises a method for resolving conflicts where a calculated address for an unsorted data word already contains a sorted data word. In this case both words are then sorted with respect to each other to create a data subclass and stored in an auxiliary storage table. In the case of subsequent conflicts, conflicting data is sorted and then moved en masse to yet a further location in auxiliary storage. A further element is a method of gathering and merging the data which has been sorted as described above. The gathering mode eliminates the null values between sorted data words and places the subclasses of resolved conflicts in sequence in the data table.

TECHNICAL FIELD

This invention relates to the sorting of data in electronic dataprocessing systems. In particular, the invention relates to a method forconstructing a sorted data file from unsorted data at high speed.

BACKGROUND OF THE INVENTION

Methods of mechanically or electromechanically sorting data intoascending or descending sequence dates back to the Herman Hollerithdesigned punch card, which was developed in the late nineteenth century.Hollerith was presented with the problem that the U.S. census datagathered in 1880 was never tabulated due to the inability of manuallyprocessing such an immense amount of data. Hollerith, a Census Bureauemployee, proceeded to invent machines to tabulate and sort the data forthe census of 1890, 1900 and the decades to follow. Improved versions ofthe Hollerith design continued to be used for the sorting of largeamounts of data for many years to follow.

The advent of the modern electronic digital computer has facilitated thecreation of computer based sorting methods that have replaced theelectromechanical Hollerith-type machines. In the Hollerith-typedevices, the punched card sorting techniques were predicated uponsorting one card column at a time, rather than all the punched columnswhich make up the data word. Through numerous iterations the rankedorder of encoded data punched on the cards was determined. The digitalcomputer has been devised with the ability to sort data in an internalmemory. This permits digital computers to compare data words rather thanjust a single data column on each pass. The digital computer is farfaster than the Hollerith-type sorting machines since not only can thewhole word be comprehended at one time but the electronic computeroperates at a far faster speed because it is purely electronic and doesnot rely on electromechanical handling and sorting devices.

In recent years, various innovative sorting techniques have been devisedin an effort to increase the speed of the sorting process. Most of thesetechniques rely upon some type of iterative process in which theunsorted data is compared, categorized and handled through a varyingnumber of iterations before the ranked order results in a sortedtabulation. For a large table of unsorted data, even modern electroniccomputers take considerable time to complete this iteration process.Some estimate that over twenty-five percent of the running time ofmodern computers is spent on the sorting of unsorted data.

It should also be noted that current data sorting techniques are slowedby increased data table size. The increased number of data words andconsequent larger data tables to be sorted result in an increase incomputer time required per item per table. This time increase per itemvaries with the method of sorting used but with all current methods,time increases with table size.

In view of the above, there is a need for a method of sorting data by acomputer at higher speeds than is currently possible.

It is therefore an object of this invention to provide a high-speed datasorting method that will greatly reduce the amount of time digitalcomputers require to sort large volumes of data.

It is a further object of this invention to greatly reduce the number oftime consuming iterations required to completely process an unsorteddata table.

It is yet another object of this invention to provide a data sortingmethod that reduces the sorting time increase that is generally broughtabout by enlarging data tables.

It is an advantage of this invention that the improvement in sortingspeed greatly increases as table size increases since the time used bythis technique increases linearly with table size rather than at somehigher order.

SUMMARY OF THE INVENTION

The invention comprises a method of completely sorting an unsorted datatable in a single pass through the data. By taking advantage of thetremendous decline in cost of computer memory, the preferred method usesa relatively large amount of computer memory in order to processunsorted data into a sorted sequence at great speed. This extensive useof computer memory space and associated expense is far outweighed by thetremendous increase in sorting speed offered by the method of thisinvention.

Specifically, in the preferred embodiment of the invention, a computerhaving a central processor and a memory section is operated to achievehigh speed data sorting and storage. Each unsorted data word from anunsorted data table is scaled to a size equal to or less than the numberof address locations in a table created to contain the sorted data. Thescaled value created for each unsorted data word is converted to anaddress increment which is added to the initial address of the tablecreated for the sorted data. According to this method, the unsorted dataword is then stored into the table for sorted data at the abovecalculated address that is related to the value of the data word. Inthis way, most words are placed in a sorted arrangement without thenecessity of comparison or iteration.

The preferred embodiment of the invention also comprises a method forresolving conflicts wherein a calculated address for an unsorted dataword already contains a sorted data word. In this embodiment of theinvention, both words are then sorted with respect to each other tocreate a data subclass and stored in an auxiliary storage table which iscreated beyond the location of the sorted data table. An addressindicator is then created to specify the location of the data subclassin auxiliary storage. The indicator is devised to show that the subjectaddress in the initial sorted data table contains a further address of adata subclass rather than a data word. In the case of subsequentconflicts, conflicting data is sorted and then moved en masse to yet afurther location in auxiliary storage which accommodates the added sizeof the sorted entries which now comprise a larger data subclass.

A further element of the preferred embodiment comprises a method ofgathering and merging the data which has been sorted as described above.It is preferred that when all of the addresses have been calculated andentries stored, a gathering mode is initiated in the computer whicheliminates the zeroes or null values and places the subclasses ofresolved conflicts in sequence in the data table. The result of thisgathering and merging is a completely sorted data table.

The foregoing and other objects and advantages of this invention will bemore apparent from the following more particular description of thepreferred embodiment of the invention as illustrated in the accompanyingdrawings, in which like reference characters refer to the same steps andoperations throughout the different views. The drawings have beendevised merely to emphasize and illustrate the principles of theinvention.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is an idealized schematic of a computer system operated inaccordance with the principles of this invention;

FIGS. 2A and 2B are idealized schematic representations of a computermemory map of the computer system of FIG. 1;

FIG. 3 is a flow chart which discloses a method of sorting unsorted datawords according to their scaled values in conformance with theprinciples of this invention;

FIG. 4 is a flow chart which discloses a conflict resolution procedurefor data words having equal scaled values;

FIG. 5 is a flow chart which discloses a data gathering procedure foruse when insufficient unused addresses remain in the data storage tablesfor insertion of sorted data; and

FIGS. 6 and 7 are flow charts which disclose the gathering and mergeprocedure which serves to combine all the previously sorted data into asingle, compact, sorted data table.

DETAILED DESCRIPTION OF THE INVENTION

This invention relies on the great reduction in cost of computer memoryto permit a rapid, single-pass sort of unsorted data. In order to allowfor this single-pass sort and required conflict resolution which resultsin a complete and reliable sort of the unsorted data, large data tablesare created in which many addresses remain unused (empty) during thesort. Generally, the most significant digits of the unsorted data wordsare used to establish a scaled value which is used to identify anaddress in a data table. The entire data word is then stored at theidentified address. Unsorted words having equal scaled values andtherefore seeking the same address are separately compared and sorted indata subclasses.

The simplified example of the above-referenced procedure is disclosedfor the purpose of most easily illustrating the principles of thisinvention. In order to clearly illustrate the procedure, certainassumptions are made, all of which can be easily addressed by minorelaborations of this procedure. It is assumed that the unsorted data isin a table of sequentially addressed entries in binary form, as is mostcommon in digital electronic computers. It is further assumed that allentries are positive in value. If, however, some of the entries arenegative in value, one merely needs to double the size of the sorttables described below and duplicate the following procedure for thenegative numbers. An idealized computer system and memory map which areconstructed according to the principles of this invention are shown inFIGS. 1 and 2. In this computer system 1, a central processor unit (CPU)2 is comprised of a control module 3 and control memory 4. The controlmemory 4 is for temporary data storage. An unsorted data table 5 will beread into the CPU for sorting. An accumulator register 6 will be used toform scaled values of the unsorted data words as described below.Finally, sort tables are constructed from a memory module 7. Anadditional memory module 8 may be required for the storage of negativenumbers. All these elements have communication links for the transfer ofdata.

Prior to actually sorting the data, the amount of computer memory thatis required for the single pass sort of the data must be determined. Aswill be clear after the complete discussion of the below procedure,substantially more memory space is required than that in which theunsorted data is stored. A primary sort table, table I in FIG. 2B, iscreated into which the unsorted data is to be placed according to itsscaled value. It is defined as being equal in size to the next power oftwo greater than the number of storage locations used to store theunsorted data. For example, if the size of the unsorted data table is1,000 words, then the next higher number having an integral power of twois 1,024, which is equal to the number 2 raised to the tenth power.According to the principles of this invention, auxiliary sort tables,II, III and IV, which are used for conflict resolution and gathering, asdefined below, account for three more memory tables equal in size to theprimary sort table. It is therefore desirable for this method thatsufficient memory be available to accommodate four times the primarysort table size. It would certainly be possible to practice thisinvention and desirable in certain instances to use even a larger numberof auxiliary sort tables. Using less than four sort tables of memory fora single pass sort, however, would be somewhat less efficient for mostcompilations of unsorted data, although still possible. The sort tablesize relative to the unsorted data table size as well as the layout ofauxiliary sort tables is schematically represented in idealized form inFIGS. 2A and 2B.

Referring now to FIG. 3, which is a flow chart of the single pass sortmethod, the first step (block 10) is to define the value of "n" for theShift Right instruction. The value "n" is devised such that the mostsignificant digits of the unsorted data word are used as part of a storeinstruction in order to store the data word at a scaled address locationin the primary sort table. In order to perform this operation, the sorttables have been created with sufficient memory locations to store anentire data word at any possible address employing an address incrementformed from those significant digits of the data word. The number "n"must also be devised such that the scaled value formed from the dataword will fit within the address portion of a Transfer to Storageinstruction. Specifically, "n" is defined to be the power of 2 of themaximum word length minus the power of 2 for the number equal orimmediately greater than the size of the unsorted data table. Forexample, if the word length is 16 bits, including the sign, then thenumber 2 raised to the fifteenth power is the largest number held bythat word. If the size of the sort table is 1,024, or 2 raised to thetenth power (as defined above) then "n" is equal to 15 minus 10, whichis 5.

This determination of "n" can be conveniently done by designing a lookuptable for the word length at hand. For example, if the unsorted datatable size is between 256 and 512, then "n" equals 6 for a 16-bit wordlength. If the unsorted table size is between 512 and 1,024, then "n"equals 5 for a 16-bit data word. Alternately, the determination of "n"can be a part of the initial setup since the size of the sort table mustbe specified at that time.

The next step in this method (block 12) is to read an unsorted word intoan accumulator register 5 (FIG. 1) from memory. If the word is zero ornegative, the word is stored unsorted in memory 8 (FIG. 1) for separateprocessing by the identical procedure for negative words (blocks 14 and16). The control then passes to "index word" (18) and the next word isread. The use of the term "control" is only an aid in following the flowdiagrams of FIGS. 3-7, but in actual operation the control module 3 ofthe computer always controls the operations.

If the word is positive, then the Shift Right "n" instruction is givenaccording to block 20. This instruction shifts the data word to theright and eliminates "n" columns from the right (least significant) sideof the word. This has the effect of dividing the value of the word by 2raised to the nth power. Thus a scaled value shall now be created thatfits within a range defined by the beginning and end of the primary sorttable addresses. This value can be considered an address increment. Byway of example, if the sign value is ignored and the data word read wereto be 123456 and the Shift instruction were given for "n" having a valueof 2, the scaled value would now be equal to 1234. In binary, of course,this would be the same as dividing the data word value by 4 (101010shift by n=2 gives 1010). In this way, the scaled value is formed fromthe most significant bits of the binary data word. In the Shift Rightinstruction, an obvious requirement is that "n" be set such that aftershifting, the address increment created will be sufficiently small tofit within the address portion of a store instruction.

The next step (22) is to create a new address by adding the aboveconstructed address increment to a store instruction whose addressportion is the beginning address of the sort table. Often, the primarysort table constructed in memory will begin at an address other than0000, as shown. By adding the address increment to the address portionof the storage instructions, a new address is developed. In this case,the exemplary binary address increment is 1010 and is added to addressportion 0000 to produce scaled address 1010. Thus this method preparesto store the unsorted data word, in its entirety, in a sort tableaddress which is scaled to the value of the unsorted data word. Sincethese scaled values may occasionally recur, a test must be performed todetermine if there is already an entry stored at the identified address.

According to step 24, the identified address is tested to see whether anentry has already been placed there. This indication is signified byeither a non-zero entry at the address, if this is the first time thereis a duplicate scaled value, or a special indicator such as a negativesign bit which indicates a previous conflict between equal scaledvalues. Any indication that something has already been placed in the newaddress signifies a conflict which must be resolved. This conflictresolution procedure is described in detail below and illustrated in theflow diagram of FIG. 4.

Absent the conflict, the procedure is to store the full data word in thesort table at the new address per block 26. Following the storage,appropriate indexing is made to the next unsorted word according toblock 18. Preparation is then made for processing the next unsortedword. At block 18 a test is also made to determine if all the unsortedwords have been processed. If not, control returns to block 12 and thenext unsorted word is read into memory and processed.

If all the unsorted words have been processed, the procedure is then togather the sorted words into a single sort table, according to step 28.This gathering and merge procedure is defined in detail below andillustrated in FIGS. 6 and 7. At the end of the processing, the singlepass sort procedure is completed.

Referring to FIG. 4, which illustrates the conflict resolution procedurefor data words having equal scaled values, the procedure will now bedescribed in detail. Assuming that there is a conflict, i.e., multipledata words seeking the same address, the routine is shifted from block24 to block 30 of FIG. 4, as shown by arrow 25. At block 30, it isdetermined whether this is the first conflict to be resolved in thisaddress location. If it is the first conflict at this location, it isnecessary to construct a special indicator for this location, block 32.This special indicator, in the case of sorting positive numbers could,for example, include a negative sign in the sign position and a "go to"(Goto) address in the remainder of the word. This address, the subjectof the Goto instruction, is obtained from a next available addresscounter, which keeps track of the next available storage addresses in anauxiliary storage table. The next available address counter keeps trackof where the data in conflict may be stored in the auxiliary storage.These conflicts are thereby added on to the end of the table beyondprimary sort table I, in which is placed non-conflicting sorted data.Such data conflicts are thereby stored sequentially in auxiliary storagefollowing sort table I (FIG. 2B). Specifically, the conflicts occupy thestorage locations specified as auxiliary sort table II and prior to anygathering, auxiliary sort table III.

Proceeding to block 34, a data subclass is created from the twoconflicting data words that had been seeking the same address. Thisfirst conflict is relatively simple since there are only two words inconflict at this point and is merely a matter of storing them in orderdepending on which is greater. They can thereby be placed in ascendingorder through direct comparison. A third word, called a limit word, isadded to the data subclass in preparation for storage at the nextavailable address (36). The limit word is constructed of specialcharacters not found in the data such as all X's or all "@" typedcharacters. The purpose of the limit word is to tell the gatheringroutine that it has reached the end of the data subclass. After thisinstruction is completed, control is passed from block 36 to block 38,which tests for whether sufficient auxiliary storage is available forstorage of the data subclass without writing over previously storeddata.

Another path is followed if this is not the first conflict between datawords seeking the same address, which is tested for in block 30. If theanswer to the question is "No," control is passed to block 40 whichfollows the path dictated in the register into which the conflict isdirected. The special indicator and Goto address instruction are readfrom the conflict address and used to determine the location of thepreviously stored data in conflict. In other words, this tells theprogram where to find the previously stored data subclass that resolvedthe earlier conflict. At block 42, the next available address isobtained from the next available address counter as explained previouslyso as to identify the location for the new data subclass. This nextavailable address is inserted with a Goto code into sort table I toreplace the previous Goto address stored there when the prior conflictwas resolved.

At block 44, the program forms a new data subclass from the conflictingdata words by inserting the new data word in the appropriately sortedlocation. This procedure is a straightforward one in which each word ofthe old data subclass is rewritten into a new location. At each rewritea test is made to determine whether or not the new word to be sortedshould be inserted due to its numerical value. Following the rewrite,all the data words that were previously seeking the same primary sortaddress due to equal scaled values are properly sorted and a limit wordis appended to complete a new data subclass. Note that this new datasubclass is stored beginning at the new address in auxiliary storage asdetermined by the next available address counter. Further, the old datasubclasses still exist in auxiliary storage but are ignored in furtherprocessing since the Goto address in the primary sort table now leadsthe program to the newest data subclass.

After the new data subclass has been formed and the limit word appendedin blocks 44 and 36, data subclasses created by either path describedabove are held in the control memory 4 of the central processing unit 2,while a test is made as to whether or not the table limit has beenreached (38). This table limit is defined as the end of table III priorto any intermediate gatherings of data and the end of table IIthereafter, as explained below, in order to avoid overwriting sorteddata. If the limit has not been reached, the new data subclass which iscomposed of two or more data words, depending on prior conflict, arestored starting at the next available address, block 46. Thereafter, thenext available address counter is indexed (48) according to the size ofthe data subclass, and control passes back to block 18 of FIG. 3. Atblock 18, assuming there are more words to be sorted, the unsorted datais indexed and the next word is read into the accumulator register forsort.

If, however, the auxiliary storage table limit has been reached (38),then an intermediate gathering must take place. Control therefore passesfrom block 38 along arrow 49 onto the flow diagram of FIG. 5. Block 50,representing CPU memory, preserves the data subclass that was createdaccording to the logical steps of FIG. 4 for later insertion into thesorted data. The program then checks (52) whether this is the first timethat the storage of data in conflict has reached the end of thedesignated space, which is to say the end of sort table III (FIG. 2B).If the answer to the question is "Yes"; i.e., there has been no priorintermediate gathering, then the next step (54) is to then gather thesorted data eliminating zeroes (unused addresses). This gatheringprocedure is defined in detail below in reference to FIGS. 6 and 7. Theresult of this gathering is stored in auxiliary sort table IV, as shownin step 56 and the program returns to block 18.

If, however, this is not the first time there has been an intermediategathering, the procedure is a little different following block 52, sincethe limit of auxiliary storage of data subclasses is the end of tableII. This is because prior gatherings will have resulted in a merge tablebeing created in table III or IV. The program must therefore firstdetermine whether the data from the last intermediate gathering wasstored in table III or table IV (step 60) in order to avoid overwritingpreviously merged data. If the answer is that the data was stored intable III, then sorted data is gathered from tables I and II and mergedwith data from table III. The merged data is then stored in table IV(block 62) and tables I, II and III are cleared of data. As may be seen,the next step is to return the program to block 18 of FIG. 3. If theanswer to the question asked at block 60 is "No," then the data gatheredin the last intermediate merge was stored in table IV. As a result, theprogram (block 66) gathers the data from tables and II and merges itwith that already existing in table IV. The result is then stored intable III and tables I, II and IV are cleared of data. In all of theabove gathering scenarios, the last derived data subclass, which hasbeen preserved in CPU memory according to block 50, is inserted into themerged and gathered data at its correct location. After the data hasbeen gathered and merged and is inserted into table III, the programonce again returns to block 18. Thus, at the end of these procedures,the intermediate merge of sorted data is stored in sequence in eitherauxiliary sort table III or sort table IV.

Turning now to FIGS. 6 and 7, which should be considered together, thegathering and merging procedure will be considered in detail.Intermediate gathering is performed to assemble all the data that hasbeen sorted up to that point in the process. This is accomplished bysequentially addressing and reading the data at the primary sort table.Links are established by the Goto instruction to gather stored data inthe auxiliary sort tables. In other words, this gathering is requiredwhen the sort tables following sort table I are filled with data addedfrom conflicts and no further storage space is allocated from memory.Thus, this compression technique is implemented to sequentially link thedata stored up to that point and eliminate unused storage locations. Theresult of an intermediate gathering is a table of sorted data (mergetable) which must later be merged with subsequently sorted data.

This data compression is also required as the final step in the singlepass sort process. The gathering is employed to compress all the sorteddata into one table and to finally merge all the auxiliary data tablesinto the sorted data table. This includes inserting all the datasubclasses that have been created.

Referring now to FIG. 6, block 72, a sorted word is read from theprimary table for sorted data. In the event that this word is a zero ora null entry, it is ignored and control is routed by way of block 74 toindex block 76. The program is thereby indexed and the next word inorder is read in from the primary sort table.

For non-zero entries, wherein data or a special indicator has beeninserted into the address, the next question to be asked at step 78 iswhether or not the special indicator has been inserted at this wordlocation. If a special indicator has been inserted therein, it indicatesthat there was a conflict between equivalent scaled values and theprogram proceeds to the location where the conflict is stored (block 92,FIG. 7). This is facilitated by the insertion of the address of the datasubclass after the special indicator. The first data word of the datasubclass is then read in as described below.

Assuming there is no special indicator, the data word is read and theprogram proceeds to block 82, which questions whether there is data tobe merged with the sort table data. If there is no data to be merged,then the data word ("this word") is stored in the next new merge tablelocation, according to block 84. This is followed by an index to thenext word in the primary sort table, at block 76. If the data word fromthe primary sort table is to be merged with a table created by anintermediate gathering as discussed above, then a comparison isnecessary between the data word ("this word") and words from theintermediate merge table which is to be merged into the new merge table.This is done at block 86 by determining whether "this word" is less thanthe next merge word. If the answer to the question is "Yes," then thecontrol transfers back to block 84 and the data "this word" is storedsince in the preferred embodiment the data is sorted into ascendingorder. If the answer is "No," however, then the word from the mergetable is stored, and the next word in the merge table is indexed asshown in block 88 and 90. Assuming there is more data to be found in themerge table, the process iterates back to block 86 to compare the mergeword with "this word" from the sort table. When all data from themerging table have been processed, or when the sort word ("this word")is less than the merge word, the program transfers to block 84, whichstores "this word" and leads the program to index to the next word fromthe sort table.

If all the words from the sort table have been sorted into the new mergetable (DONE?), the CPU exits the program to block 77. At block 77, theprogram tests to see if all the unsorted words have been read in fromthe unsorted data table 5 (DONE?), if they have, the single pass sort iscomplete, if not, the program returns to block 18 and continues to readin the unsorted words. When all the unsorted data has been read, all thedata words will have been entered in sequential order in the last mergetable to be created.

In FIG. 7, a schematic diagram is used to explain how gathering andmerge procedure deals with a special indicator inserted into an address.At block 92, the program is directed to the address containing the datain conflict, constituting the beginning of a data subclass, and then toblocks 94 and 96 where data words are read in sequence, when the dataword from the data subclass is read in, it is tested as to whether it isdata to be sorted or is a limit word (98). If the value is a limit word,then control is passed back to block 76 of FIG. 6. This would not be thecase if the computer has just entered this section of the program havingbeen directed there by a special indicator. If therefore, the value readout of the data subclass is not a limit word, the program must nextcheck whether merging is to take place simultaneously with the gatheringof the data (100). If the answer at block 100 is "No," control proceedsto block 102 where the word from the data subclass is stored in the nextlocation on the merge table. Thereafter, an indexing takes place to thenext word in the data conflict subclass (104). The program will workthrough the data subclass until it reaches a limit word and exits thisportion of the program.

In the situation where intermediate gatherings have taken place, therewill be a merge table that must be integrated with the gatheringprocedure. Therefore, if the answer to block 100 is "Yes," then controlis passed to block 106 where the word from the data subclass ("thisword") is compared to the next word from the merge table. If "this word"is less than the next word from the merge table, the program exits thesubroutine and stores "this word" in the next sort table location (102).If, however, "this word" is greater than the next merge word, the nextmerge word is stored first and the merge table is indexed as shown inblocks 108 and 110. This will continue to occur until the word from thedata subclass "this word" is less than the next merge word at whichpoint "this word" is stored. All the data words are thereby put insequential order. Optionally, the program will exit the subroutine whenthe merge table end has been reached. This, of course, is similar to themerge routine as described in reference to FIG. 6 above. A furtheroption that may be required if the unsorted data table contains datawords of equal value. In this case, either one of the data words iseliminated or both are stored arbitrarily in sequence.

Thus, when these two last subroutines are finally complete, all the datawords will have been inserted into a single data table in sequentialorder after only a single pass through the unsorted data table. Themultiple iterations of the prior art are thereby eliminated. While thisprocedure does use considerably more memory than prior art procedures,one should note that the immediate problem in modern computers is speedas opposed to memory. The sort procedure of this invention has beentested against conventional sorts in a personal computer. A relativelyrandom unordered data file was obtained by taking the last four digitsfrom selected pages of the Greater Boston Telephone Directory WhitePages. One hundred numbers, or approximately one column, were used fromeach of ten different pages distributed throughout the directory. Thesenumbers appear to be randomly distributed, ranging from 0003 to 9999.Through a test of this program, it was seen that the sorting routine ofthis invention ranges between three and four times faster for anunsorted table of 100 numbers to one hundred fifty times faster thanconventional sorting routines for an unsorted table of 1,000 numbers.

While the invention has been particularly described with reference tothe preferred embodiment thereof, it will be understood by those skilledin the art that various changes in substance and form can be madetherein without having departed from the spirit and scope of thisinvention as detailed in the attached claims.

I claim:
 1. Digital data processor apparatus for sorting a plurality ofunsorted data words into sequential order, said digital data processorcomprising:A. at least four storage table means--referred to as firststorage table means, second storage table means, third storage tablemeans, and fourth storage table means, respectively--each capable ofstoring a plurality of data words at addressable locations therein, B.address-forming means for generating an address for each of successivecandidate ones of said unsorted data words, where each such data wordhas a value that can be represented by a plurality of digits arranged inorder of increasing significance, and where each such address isgenerated by shifting the digits representing the value of thecorresponding data word to remove therefrom one or more digits of leastsignificance, C. primary sort means, coupled to said address-formingmeans, for testing said first storage table means at a locationcorresponding to the address of a candidate data word to determinewhether that location is occupied by at least one prior data word, saidprimary sort means including means for responding to a negative suchdetermination for storing that candidate data word in said first storagetable means at that location, said primary sort means further includingmeans for responding to a positive such determination for (i) storingsaid candidate data word and said at least one prior data word in saidsecond storage table means at a next available addressable locationtherein, and (ii) storing in said first storage table, at said locationcorresponding to the address of the candidate data word, a signalrepresentative of that next available addressable location, D.merge/gathering means, coupled to said at least four storage tablemeans, for determining whether a remaining storage capacity in saidsecond storage table means is sufficient and, if not, for storing in aselected, alternating one of said third storage table and said fourthstorage table means a sorted list of data words formed by merging datawords stored in the others of said storage table means, and for clearingsaid others of said storage table means to permit storage therein offurther data words.